Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems
نویسندگان
چکیده
MOTIVATION Partial order alignment (POA) has been proposed as a new approach to multiple sequence alignment (MSA), which can be combined with existing methods such as progressive alignment. This is important for addressing problems both in the original version of POA (such as order sensitivity) and in standard progressive alignment programs (such as information loss in complex alignments, especially surrounding gap regions). RESULTS We have developed a new Partial Order-Partial Order alignment algorithm that optimally aligns a pair of MSAs and which therefore can be applied directly to progressive alignment methods such as CLUSTAL. Using this algorithm, we show the combined Progressive POA alignment method yields results comparable with the best available MSA programs (CLUSTALW, DIALIGN2, T-COFFEE) but is far faster. For example, depending on the level of sequence similarity, aligning 1000 sequences, each 500 amino acids long, took 15 min (at 90% average identity) to 44 min (at 30% identity) on a standard PC. For large alignments, Progressive POA was 10-30 times faster than the fastest of the three previous methods (CLUSTALW). These data suggest that POA-based methods can scale to much larger alignment problems than possible for previous methods. AVAILABILITY The POA source code is available at http://www.bioinformatics.ucla.edu/poa
منابع مشابه
An Application of the ABS LX Algorithm to Multiple Sequence Alignment
We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...
متن کاملgpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences
Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...
متن کاملEfficient Construction of accurate Multiple alignments and Large-Scale phylogenies
A central focus of computational biology is to organize and make use of vast stores of molecular sequence data. Two of the most studied and fundamental problems in the field are sequence alignment and phylogeny inference. The problem of multiple sequence alignment is to take a set of DNA, RNA, or protein sequences and identify related segments of these sequences. Perhaps the most common use of ...
متن کاملRecent developments in the MAFFT multiple sequence alignment program
The accuracy and scalability of multiple sequence alignment (MSA) of DNAs and proteins have long been and are still important issues in bioinformatics. To rapidly construct a reasonable MSA, we developed the initial version of the MAFFT program in 2002. MSA software is now facing greater challenges in both scalability and accuracy than those of 5 years ago. As increasing amounts of sequence dat...
متن کاملMultiple sequence alignment using partial order graphs
MOTIVATION Progressive Multiple Sequence Alignment (MSA) methods depend on reducing an MSA to a linear profile for each alignment step. However, this leads to loss of information needed for accurate alignment, and gap scoring artifacts. RESULTS We present a graph representation of an MSA that can itself be aligned directly by pairwise dynamic programming, eliminating the need to reduce the MS...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 20 10 شماره
صفحات -
تاریخ انتشار 2004